Failure resistant multiple computer system and method

ABSTRACT

The updating of only some memory locations in a multiple computer environment in which at least one applications program ( 50 ) executes simultaneously on a plurality of computers M 1 , M 2  . . . Mn each of which has a local memory, is disclosed. Memory locations (A, B, D, E, X) in said local memory are categorized into two groups. The first group of memory locations (X 1 , X 2 , . . . Xn, A 1 , A 2 , . . . An) are each present in other computers. The second group of memory locations (B, E) are each present only in the computer having the local memory including the memory location. Changes to the contents of memory locations in the first group only are transmitted to all other computers. A computer failure detection mechanism is disclosed to prevent updating of any first group memory locations of any failed computer.

CROSS-REFERENCE TO RELATED APPLICATIONS

This is a divisional of U.S. application Ser. No. 11/583,351 filed Oct.18, 2006 entitled “Failure Resistant Multiple Computer system andMethod,” and claims the benefit of priority to U.S. ProvisionalApplication No. 60/730,512 entitled “Failure Resistant Multiple ComputerSystem and Method” filed Oct. 25, 2005; which is hereby incorporated byreference.

FIELD OF THE INVENTION

The present invention relates to computing and, in particular, to thesimultaneous operation of a plurality of computers interconnected via acommunications network.

BACKGROUND ART

International Patent Application No. PCT/AU2005/000580 (Attorney Ref5027F-WO) published under WO 2005/103926 (to which U.S. patentapplication Ser. No. 11/111,946 and published under No. 2005-0262313corresponds) in the name of the present applicant, discloses howdifferent portions of an application program written to execute on onlya single computer can be operated substantially simultaneously on acorresponding different one of a plurality of computers. Thatsimultaneous operation has not been commercially used as of the prioritydate of the present application. International Patent Application Nos.PCT/AU2005/001641 (Attorney Ref 5027F-D1-WO) to which U.S. patentapplication Ser. No. 11/259,885 entitled: “Computer Architecture Methodof Operation for Multi-Computer Distributed Processing and Co-ordinatedMemory and Asset Handling” corresponds and PCT/AU2006/000532 (AttorneyRef: 5027F-D2-WO) in the name of the present applicant and unpublishedas at the priority date of the present application, also disclosefurther details. The contents of each of the abovementioned priorapplication(s) are hereby incorporated into the present application bycross reference for all purposes.

Briefly stated, the abovementioned patent specifications disclose thatat least one application program written to be operated on only a singlecomputer can be simultaneously operated on a number of computers eachwith independent local memory. The memory locations required for theoperation of that program are replicated in the independent local memoryof each computer. On each occasion on which the application programwrites new data to any replicated memory location, that new data istransmitted and stored at each corresponding memory location of eachcomputer. Thus apart from the possibility of transmission delays, eachcomputer has a local memory the contents of which are substantiallyidentical to the local memory of each other computer and are updated toremain so. Since all application programs, in general, read data muchmore frequently than they cause new data to be written, theabovementioned arrangement enables very substantial advantages incomputing speed to be achieved. In particular, the stratagem enables twoor more commodity computers interconnected by a commodity communicationsnetwork to be operated simultaneously running under the applicationprogram written to be executed on only a single computer.

In many situations, the above-mentioned arrangements worksatisfactorily. This applies particularly where the programmer is awarethat there may be updating delays and so can adjust the flow of theprogram to account for this. However, there are situations in which theuse of stale contents or values instead of the latest content can createproblems.

The genesis of the present invention is a desire to at least partiallyovercome the abovementioned difficulty.

SUMMARY OF THE INVENTION

In accordance with a first aspect of the present invention there isdisclosed a failure resistant method of operating a plurality ofcomputers each with their corresponding independent local memory, eachsimultaneously operating an application program, and each beingconnected via a communications network to permit updating ofcorresponding memory locations, said method comprising the steps of:

(i) categorizing the memory locations of said local memories into afirst reachability category in which the local memory locations areaccessible by selected ones, or all, of said computers and thereforerequire updating via said communications network with changes tocorresponding memory locations of the other computers having access tomaintain substantial memory coherence, and into a second category inwhich the local memory locations are accessible only by the localcomputer and therefore no updating is required,(ii) detecting failure of any one of said multiple computers, and(iii) modifying said first category to remove therefrom, if present,reference to accessibility by the failed computer, whereby no attempt ismade to update any first category locations of said failed computer.

In accordance with a second aspect of the present invention there isdisclosed a failure resistant multiple computer system in which aplurality of computers each has a corresponding independent localmemory, each simultaneously operates a corresponding portion of anapplication program written to be executed only on a single computer,and each is connected via a communications network to permit updating ofcorresponding memory locations, said system including a reachabilitymeans to categorize memory locations of said local memories into a firstcategory in which the local memory locations are replicated in selectedones, or all, of said computers and therefore require updating via saidcommunications network with changes to corresponding memory locations ofother computers, to maintain substantial memory coherence, and into asecond category in which the local memory locations are present only inthe local computer and therefore no updating is required, and whereinsaid system further includes a failure detection means connected to eachsaid computer to detect failure of any one of said multiple computers,and a reachability modifier connected to said failure connection meansand to said reachability means to modify said reachability means bymodifying said first category to remove therefrom, if present, anyreference by the failed computer whereby no attempt is made to updateany first category memory locations of said failed computer.

In accordance with a third aspect of the present invention there isdisclosed a computer program product comprising a set of programinstructions stored in a storage medium and operable to permit aplurality of computers to carry out the above-mentioned method.

In accordance with a fourth aspect of the present invention there isdisclosed a plurality of computers interconnected via a communicationsnetwork and operable to ensure carrying out the abovementioned method.

In accordance with a fifth aspect of the present invention there isdisclosed a single computer adapted to co-operate with at least oneother computer to carry out the above method or form the above computersystem.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the present invention will now be describedwith reference to the drawings in which:

FIG. 1A is a schematic illustration of a prior art computer arranged tooperate JAVA code and thereby constitute a single JAVA virtual machine,

FIG. 1B is a drawing similar to FIG. 1A but illustrating the initialloading of code,

FIG. 1C illustrates the interconnection of a multiplicity of computerseach being a JAVA virtual machine to form a multiple computer system,

FIG. 2 schematically illustrates “n” application running computers towhich at least one additional server machine X is connected as a server,

FIG. 3 is a schematic map of the memory locations in all the multiplemachines showing memory locations including classes and objects,

FIG. 4 is a single reachability table showing the various memorylocations of FIG. 3 and their ability to be reached,

FIG. 5 shows multiple reachability tables equivalent to FIG. 4,

FIG. 6 is a map similar to FIG. 3 and showing memory location X pointingto memory location A,

FIG. 7 is the single reachability table corresponding to FIG. 6,

FIG. 8 shows the multiple reachability tables corresponding to FIG. 6,

FIG. 9 is a flow chart showing one embodiment of the procedure to beundertaken in the event of failure of one of the n application runningcomputers,

FIG. 10 is a flow chart of the procedures of a second embodiment,

FIG. 11 is a map similar to FIG. 6 but illustrating the situation of abreak in communications with machine M2,

FIG. 12 shows the single reachability table in the circumstance of FIG.11, and

FIG. 13 shows the multiple reachability tables in the circumstance ofFIG. 11.

DETAILED DESCRIPTION

The embodiments will be described with reference to the JAVA language,however, it will be apparent to those skilled in the art that theinvention is not limited to this language and, in particular can be usedwith other languages (including procedural, declarative and objectoriented languages) including the MICROSOFT.NET platform andarchitecture (Visual Basic, Visual C, and Visual C++, and Visual C#),FORTRAN, C, C++, COBOL, BASIC and the like.

It is known in the prior art to provide a single computer or machine(produced by any one of various manufacturers and having an operatingsystem (or equivalent control software or other mechanism) operating inany one of various different languages) utilizing the particularlanguage of the application by creating a virtual machine as illustratedin FIG. 1A.

The code and data and virtual machine configuration or arrangement ofFIG. 1A takes the form of the application code 50 written in the JAVAlanguage and executing within the JAVA virtual machine 61. Thus wherethe intended language of the application is the language JAVA, a JAVAvirtual machine is used which is able to operate code in JAVAirrespective of the machine manufacturer and internal details of thecomputer or machine. For further details, see “The JAVA Virtual MachineSpecification” 2^(nd) Edition by T. Lindholm and F. Yellin of SunMicrosystems Inc of the USA which is incorporated herein by reference.

This conventional art arrangement of FIG. 1A is modified in accordancewith embodiments of the present invention by the provision of anadditional facility which is conveniently termed a “distributed runtime” or a “distributed run time system” DRT 71 and as seen in FIG. 1B.

In FIGS. 1B and 1C, the application code 50 is loaded onto the JavaVirtual Machine(s) M1, M2, . . . Mn in cooperation with the distributedruntime system 71, through the loading procedure indicated by arrow 75or 75A or 75B. As used herein the terms “distributed runtime” and the“distributed run time system” are essentially synonymous, and by meansof illustration but not limitation are generally understood to includelibrary code and processes which support software written in aparticular language running on a particular platform. Additionally, adistributed runtime system may also include library code and processeswhich support software written in a particular language running within aparticular distributed computing environment. A runtime system (whethera distributed runtime system or not) typically deals with the details ofthe interface between the program and the operating system such assystem calls, program start-up and termination, and memory management.For purposes of background, a conventional Distributed ComputingEnvironment (DCE) (that does not provide the capabilities of theinventive distributed run time or distributed run time system 71 used inthe preferred embodiments of the present invention) is available fromthe Open Software Foundation. This Distributed Computing Environment(DCE) performs a form of computer-to-computer communication for softwarerunning on the machines, but among its many limitations, it is not ableto implement the desired modification or communication operations. Amongits functions and operations the preferred DRT 71 coordinates theparticular communications between the plurality of machines M1, M2, . .. Mn. Moreover, the preferred distributed runtime 71 comes intooperation during the loading procedure indicated by arrow 75A or 75B ofthe JAVA application 50 on each JAVA virtual machine 72 or machinesJVM#1, JVM#2, . . . JVM#n of FIG. 1C. It will be appreciated in light ofthe description provided herein that although many examples anddescriptions are provided relative to the JAVA language and JAVA virtualmachines so that the reader may get the benefit of specific examples,the invention is not restricted to either the JAVA language or JAVAvirtual machines, or to any other language, virtual machine, machine oroperating environment.

FIG. 1C shows in modified form the arrangement of the JAVA virtualmachines, each as illustrated in FIG. 1B. It will be apparent that againthe same application code 50 is loaded onto each machine M1, M2 . . .Mn. However, the communications between each machine M1, M2 . . . Mn areas indicated by arrows 83, and although physically routed through themachine hardware, are advantageously controlled by the individual DRT's71/1 . . . 71/n within each machine. Thus, in practice this may beconceptionalised as the DRT's 71/1, . . . 71/n communicating with eachother via the network or other communications link 53 rather than themachines M1, M2 . . . Mn communicating directly themselves or with eachother. Contemplated and included are either this direct communicationbetween machines M1, M2 . . . Mn or DRT's 71/1, 71/2 . . . 71/n or acombination of such communications. The preferred DRT 71 providescommunication that is transport, protocol, and link independent.

The one common application program or application code 50 and itsexecutable version (with likely modification) is simultaneously orconcurrently executing across the plurality of computers or machines M1,M2 . . . Mn. The application program 50 is written to execute on asingle machine or computer (or to operate on the multiple computersystem of the abovementioned patent applications which emulate singlecomputer operation). Essentially the modified structure is to replicatean identical memory structure and contents on each of the individualmachines.

The term “common application program” is to be understood to mean anapplication program or application program code written to operate on asingle machine, and loaded and/or executed in whole or in part on eachone of the plurality of computers or machines M1, M2 . . . Mn, oroptionally on each one of some subset of the plurality of computers ormachines M1, M2 . . . Mn. Put somewhat differently, there is a commonapplication program represented in application code 50. This is either asingle copy or a plurality of identical copies each individuallymodified to generate a modified copy or version of the applicationprogram or program code. Each copy or instance is then prepared forexecution on the corresponding machine. At the point after they aremodified they are common in the sense that they perform similaroperations and operate consistently and coherently with each other. Itwill be appreciated that a plurality of computers, machines, informationappliances, or the like implementing embodiments of the invention mayoptionally be connected to or coupled with other computers, machines,information appliances, or the like that do not implement embodiments ofthe invention.

The same application program 50 (such as for example a parallel mergesort, or a computational fluid dynamics application or a data miningapplication) is run on each machine, but the executable code of thatapplication program is modified on each machine as necessary such thateach executing instance (copy or replica) on each machine coordinatesits local operations on that particular machine with the operations ofthe respective instances (or copies or replicas) on the other machinessuch that they function together in a consistent, coherent andcoordinated manner and give the appearance of being one global instanceof the application (i.e. a “meta-application”).

The copies or replicas of the same or substantially the same applicationcodes, are each loaded onto a corresponding one of the interoperatingand connected machines or computers. As the characteristics of eachmachine or computer may differ, the application code 50 may be modifiedbefore loading, or during the loading process, or with somedisadvantages after the loading process, to provide a customization ormodification of the application code on each machine. Some dissimilaritybetween the programs or application codes on the different machines maybe permitted so long as the other requirements for interoperability,consistency, and coherency as described herein can be maintained. As itwill become apparent hereafter, each of the machines M1, M2 . . . Mn andthus all of the machines M1, M2 . . . Mn have the same or substantiallythe same application code 50, usually with a modification that may bemachine specific.

Before the loading of, or during the loading of, or at any timepreceding the execution of, the application code 50 (or the relevantportion thereof) on each machine M1, M2 . . . Mn, each application code50 is modified by a corresponding modifier 51 according to the samerules (or substantially the same rules since minor optimizing changesare permitted within each modifier 51/1, 51/2 . . . 51/n).

Each of the machines M1, M2 . . . Mn operates with the same (orsubstantially the same or similar) modifier 51 (in some embodimentsimplemented as a distributed run time or DRT71 and in other embodimentsimplemented as an adjunct to the application code and data 50, and alsoable to be implemented within the JAVA virtual machine itself). Thus allof the machines M1, M2 . . . Mn have the same (or substantially the sameor similar) modifier 51 for each modification required. A differentmodification, for example, may be required for memory management andreplication, for initialization, for finalization, and/or forsynchronization (though not all of these modification types may berequired for all embodiments).

There are alternative implementations of the modifier 51 and thedistributed run time 71. For example, as indicated by broken lines inFIG. 1C, the modifier 51 may be implemented as a component of or withinthe distributed run time 71, and therefore the DRT 71 may implement thefunctions and operations of the modifier 51. Alternatively, the functionand operation of the modifier 51 may be implemented outside of thestructure, software, firmware, or other means used to implement the DRT71 such as within the code and data 50, or within the JAVA virtualmachine itself. In one embodiment, both the modifier 51 and DRT 71 areimplemented or written in a single piece of computer program code thatprovides the functions of the DRT and modifier. In this case themodifier function and structure is, in practice, subsumed into the DRT.Independent of how it is implemented, the modifier function andstructure is responsible for modifying the executable code of theapplication code program, and the distributed run time function andstructure is responsible for implementing communications between andamong the computers or machines. The communications functionality in oneembodiment is implemented via an intermediary protocol layer within thecomputer program code of the DRT on each machine. The DRT can, forexample, implement a communications stack in the JAVA language and usethe Transmission Control Protocol/Internet Protocol (TCP/IP) to providefor communications or talking between the machines. These functions oroperations may be implemented in a variety of ways, and it will beappreciated in light of the description provided herein that exactly howthese functions or operations are implemented or divided betweenstructural and/or procedural elements, or between computer program codeor data structures, is not important or crucial to the invention.

However, in the arrangement illustrated in FIG. 1C, a plurality ofindividual computers or machines M1, M2 . . . Mn are provided, each ofwhich are interconnected via a communications network 53 or othercommunications link. Each individual computer or machine is providedwith a corresponding modifier 51. Each individual computer is alsoprovided with a communications port which connects to the communicationsnetwork. The communications network 53 or path can be any electronicsignalling, data, or digital communications network or path and ispreferably a slow speed, and thus low cost, communications path, such asa network connection over the Internet or any common networkingconfigurations including ETHERNET or INFINIBAND and extensions andimprovements, thereto. Preferably, the computers are provided with oneor more known communications ports (such as CISCO Power Connect 5224Switches) which connect with the communications network 53.

As a consequence of the above described arrangement, if each of themachines M1, M2, . . . , Mn has, say, an internal or local memorycapability of 10 MB, then the total memory available to the applicationcode 50 in its entirety is not, as one might expect, the number ofmachines (n) times 10 MB. Nor is it the additive combination of theinternal memory capability of all n machines. Instead it is either 10MB, or some number greater than 10 MB but less than n×10 MB. In thesituation where the internal memory capacities of the machines aredifferent, which is permissible, then in the case where the internalmemory in one machine is smaller than the internal memory capability ofat least one other of the machines, then the size of the smallest memoryof any of the machines may be used as the maximum memory capacity of themachines when such memory (or a portion thereof) is to be treated as‘common’ memory (i.e. similar equivalent memory on each of the machinesM1 . . . Mn) or otherwise used to execute the common application code.

However, even though the manner that the internal memory of each machineis treated may initially appear to be a possible constraint onperformance, how this results in improved operation and performance willbecome apparent hereafter. Naturally, each machine M1, M2. . . . Mn hasa private (i.e. ‘non-common’) internal memory capability. The privateinternal memory capability of the machines M1, M2, . . . , Mn arenormally approximately equal but need not be. For example, when amultiple computer system is implemented or organized using existingcomputers, machines, or information appliances, owned or operated bydifferent entities, the internal memory capabilities may be quitedifferent. On the other hand, if a new multiple computer system is beingimplemented, each machine or computer is preferably selected to have anidentical internal memory capability, but this need not be so.

It is to be understood that the independent local memory of each machinerepresents only that part of the machine's total memory which isallocated to that portion of the application program running on thatmachine. Thus, other memory will be occupied by the machine's operatingsystem and other computational tasks unrelated to the applicationprogram 50.

Non-commercial operation of a prototype multiple computer systemindicates that not every machine or computer in the system utilises orneeds to refer to (e.g. have a local replica of) every possible memorylocation. As a consequence, it is possible to operate a multiplecomputer system without the local memory of each machine being identicalto every other machine, so long as the local memory of each machine issufficient for the operation of that machine. That is to say, provided aparticular machine does not need to refer to (for example have a localreplica of) some specific memory locations, then it does not matter thatthose specific memory locations are not replicated in that particularmachine.

It may also be advantageous to select the amounts of internal memory ineach machine to achieve a desired performance level in each machine andacross a constellation or network of connected or coupled plurality ofmachines, computers, or information appliances M1, M2, . . . , Mn.Having described these internal and common memory considerations, itwill be apparent in light of the description provided herein that theamount of memory that can be common between machines is not alimitation.

In some embodiments, some or all of the plurality of individualcomputers or machines can be contained within a single housing orchassis (such as so-called “blade servers” manufactured byHewlett-Packard Development Company, Intel Corporation, IBM Corporationand others) or the multiple processors (eg symmetric multiple processorsor SMPs) or multiple core processors (eg dual core processors and chipmultithreading processors) manufactured by Intel, AMD, or others, orimplemented on a single printed circuit board or even within a singlechip or chip set. Similarly, also included are computers or machineshaving multiple cores, multiple CPU's or other processing logic.

When implemented in a non-JAVA language or application code environment,the generalized platform, and/or virtual machine and/or machine and/orruntime system is able to operate application code 50 in the language(s)(possibly including for example, but not limited to any one or more ofsource-code languages, intermediate-code languages, object-codelanguages, machine-code languages, and any other code languages) of thatplatform and/or virtual machine and/or machine and/or runtime systemenvironment, and utilize the platform, and/or virtual machine and/ormachine and/or runtime system and/or language architecture irrespectiveof the machine or processor manufacturer and the internal details of themachine. It will also be appreciated that the platform and/or runtimesystem can include virtual machine and non-virtual machine softwareand/or firmware architectures, as well as hardware and direct hardwarecoded applications and implementations.

For a more general set of virtual machine or abstract machineenvironments, and for current and future computers and/or computingmachines and/or information appliances or processing systems, and thatmay not utilize or require utilization of either classes and/or objects,the inventive structure, method and computer program and computerprogram product are still applicable. Examples of computers and/orcomputing machines that do not utilize either classes and/or objectsinclude for example, the x86 computer architecture manufactured by IntelCorporation and others, the SPARC computer architecture manufactured bySun Microsystems, Inc and others, the Power PC computer architecturemanufactured by International Business Machines Corporation and others,and the personal computer products made by Apple Computer, Inc., andothers.

For these types of computers, computing machines, informationappliances, and the virtual machine or virtual computing environmentsimplemented thereon that do not utilize the idea of classes or objects,may be generalized for example to include primitive data types (such asinteger data types, floating point data types, long data types, doubledata types, string data types, character data types and Boolean datatypes), structured data types (such as arrays and records), derivedtypes, or other code or data structures of procedural languages or otherlanguages and environments such as functions, pointers, components,modules, structures, reference and unions. These structures andprocedures when applied in combination when required, maintain acomputing environment where memory locations, address ranges, objects,classes, assets, resources, or any other procedural or structural aspectof a computer or computing environment are where required created,maintained, operated, and deactivated or deleted in a coordinated,coherent, and consistent manner across the plurality of individualmachines M1, M2 . . . Mn.

This analysis or scrutiny of the application code 50 can take placeeither prior to loading the application program code 50, or during theapplication program code 50 loading procedure, or even after theapplication program code 50 loading procedure (or some combination ofthese). It may be likened to an instrumentation, program transformation,translation, or compilation procedure in that the application code canbe instrumented with additional instructions, and/or otherwise modifiedby meaning-preserving program manipulations, and/or optionallytranslated from an input code language to a different code language(such as for example from source-code language or intermediate-codelanguage to object-code language or machine-code language). In thisconnection it is understood that the term compilation normally orconventionally involves a change in code or language, for example, fromsource code to object code or from one language to another language.However, in the present instance the term “compilation” (and itsgrammatical equivalents) is not so restricted and can also include orembrace modifications within the same code or language. For example, thecompilation and its equivalents are understood to encompass bothordinary compilation (such as for example by way of illustration but notlimitation, from source-code to object code), and compilation fromsource-code to source-code, as well as compilation from object-code toobject code, and any altered combinations therein. It is also inclusiveof so-called “intermediary-code languages” which are a form of “pseudoobject-code”.

By way of illustration and not limitation, in one embodiment, theanalysis or scrutiny of the application code 50 takes place during theloading of the application program code such as by the operating systemreading the application code 50 from the hard disk or other storagedevice, medium or source and copying it into memory and preparing tobegin execution of the application program code. In another embodiment,in a JAVA virtual machine, the analysis or scrutiny may take placeduring the class loading procedure of thejava.lang.ClassLoader.loadClass method (e.g.“java.lang.ClassLoader.loadClass( )”).

Alternatively, or additionally, the analysis or scrutiny of theapplication code 50 (or of a portion of the application code) may takeplace even after the application program code loading procedure, such asafter the operating system has loaded the application code into memory,or optionally even after execution of the relevant corresponding portionof the application program code has started, such as for example afterthe JAVA virtual machine has loaded the application code into thevirtual machine via the “java.lang.ClassLoader.loadClass( )” method andoptionally commenced execution.

Persons skilled in the computing arts will be aware of various possibletechniques that may be used in the modification of computer code,including but not limited to instrumentation, program transformation,translation, or compilation means and/or methods.

One such technique is to make the modification(s) to the applicationcode, without a preceding or consequential change of the language of theapplication code. Another such technique is to convert the original code(for example, JAVA language source-code) into an intermediaterepresentation (or intermediate-code language, or pseudo code), such asJAVA byte code. Once this conversion takes place the modification ismade to the byte code and then the conversion may be reversed. Thisgives the desired result of modified JAVA code.

A further possible technique is to convert the application program tomachine code, either directly from source-code or via the abovementionedintermediate language or through some other intermediate means. Then themachine code is modified before being loaded and executed. A stillfurther such technique is to convert the original code to anintermediate representation, which is thus modified and subsequentlyconverted into machine code.

The present invention encompasses all such modification routes and alsoa combination of two, three or even more, of such routes.

The DRT 71 or other code modifying means is responsible for creating orreplicating a memory structure and contents on each of the individualmachines M1, M2 . . . Mn that permits the plurality of machines tointeroperate. In some embodiments this replicated memory structure willbe identical. Whilst in other embodiments this memory structure willhave portions that are identical and other portions that are not. Instill other embodiments the memory structures are different only informat or storage conventions such as Big Endian or Little Endianformats or conventions.

These structures and procedures when applied in combination whenrequired, maintain a computing environment where the memory locations,address ranges, objects, classes, assets, resources, or any otherprocedural or structural aspect of a computer or computing environmentare where required created, maintained, operated, and deactivated ordeleted in a coordinated, coherent, and consistent manner across theplurality of individual machines M1, M2 . . . Mn.

Therefore the terminology “one”, “single”, and “common” application codeor program includes the situation where all machines M1, M2 . . . Mn areoperating or executing the same program or code and not different (andunrelated) programs, in other words copies or replicas of same orsubstantially the same application code are loaded onto each of theinteroperating and connected machines or computers.

In conventional arrangements utilising distributed software, memoryaccess from one machine's software to memory physically located onanother machine typically takes place via the network interconnectingthe machines. Thus, the local memory of each machine is able to beaccessed by any other machine and can therefore cannot be said to beindependent. However, because the read and/or write memory access tomemory physically located on another computer require the use of theslow network interconnecting the computers, in these configurations suchmemory accesses can result in substantial delays in memory read/writeprocessing operations, potentially of the order of 10⁶-10⁷ cycles of thecentral processing unit of the machine (given contemporary processorspeeds). Ultimately this delay is dependent upon numerous factors, suchas for example, the speed, bandwidth, and/or latency of thecommunication network. This in large part accounts for the diminishedperformance of the multiple interconnected machines in the prior artarrangement.

However, in the present arrangement all reading of memory locations ordata is satisfied locally because a current value of all (or some subsetof all) memory locations is stored on the machine carrying out theprocessing which generates the demand to read memory.

Similarly, all writing of memory locations or data is satisfied locallybecause a current value of all (or some subset of all) memory locationsis stored on the machine carrying out the processing which generates thedemand to write to memory.

Such local memory read and write processing operation can typically besatisfied within 10²-10³ cycles of the central processing unit. Thus, inpractice there is substantially less waiting for memory accesses whichinvolves and/or writes. Also, the local memory of each machine is notable to be accessed by any other machine and can therefore be said to beindependent.

The invention is transport, network, and communications pathindependent, and does not depend on how the communication betweenmachines or DRTs takes place. In one embodiment, even electronic mail(email) exchanges between machines or DRTs may suffice for thecommunications.

In connection with the above, it will be seen from FIG. 2 that there area number of machines M1, M2, . . . Mn, “n” being an integer greater thanor equal to two, on which the application program 50 of FIG. 1 is beingrun substantially simultaneously. These machines are allocated a number1, 2, 3, . . . etc. in a hierarchical order. This order is normallylooped or closed so that whilst machines 2 and 3 are hierarchicallyadjacent, so too are machines “n” and 1. There is preferably a furthermachine X which is provided to enable various housekeeping functions tobe carried out, such as acting as a lock server. In particular, thefurther machine X can be a low value machine, and much less expensivethan the other machines which can have desirable attributes such asprocessor speed. Furthermore, an additional low value machine (X+1) ispreferably available to provide redundancy in case machine X shouldfail. Where two such server machines X and X+1 are provided, they arepreferably, for reasons of simplicity, operated as dual machines in acluster configuration. Machines X and X+1 could be operated as amultiple computer system in accordance with the present invention, ifdesired. However this would result in generally undesirable complexity.If the machine X is not provided then its functions, such ashousekeeping functions, are provided by one, or some, or all of theother machines.

Turning now to FIG. 3, each of the multiple machines M1, M2 . . . Mn(other than any server machine X if present) has its memory locationsschematically illustrated. For machine M1 there is a class X1 and anobject B. For machine M2 there is a class X2 which is the same as formachine M1, and an object D. For machine Mn there is the same class Xnand two objects A and E. The contents of the memory location X are thesame for each of the machines and each machine is able to both readfrom, and write to, memory location X. For this reason, the boundary ofmemory location X is indicated with a double line.

Preferably, it is convenient for the server machine X of FIG. 2, tomaintain a table listing each memory location and the machines which areable to access each memory location in the table. Such a table is saidto be a reachability table and is illustrated in FIG. 4. The first rowin the table of FIG. 4 deals with memory location A which is only ableto be accessed by machine Mn. The second row in the table of FIG. 4deals with memory location B which is only able to be accessed bymachine M1. Similarly, object D is only able to be accessed by machineM2 and object E is only able to be accessed by machine Mn. However, theclass X is able to be accessed by all of the machines M1, M2 and Mn.

The single reachability table of FIG. 4 is preferably located in, andmaintained by, the server machine X. However, it is also possible forthe computer system to be operated without a server machine X in whichcase it is desirable for each machine to operate its own reachabilitytable. FIG. 5 illustrates individual reachability tables for theindividual machines in the circumstances corresponding to FIG. 4.

Thus, in FIG. 5 the table for machine M1 has a row for class X and a rowfor object B. Similarly, the table for machine M2 has a row for class Xand a row for object D. However, the table for machine Mn has threerows, one for class X, and one for each of objects A and E.

In the multi-machine environment described above, in the event that thecontent of class X is changed by being written to by one of themachines, then it is necessary to transmit that change in content viathe network 53 to all the other machines. However, since the objects A,B, D and E are each are only able to be accessed by a single machine,there is little point in either creating or updating the contents ofthese memory locations since they are only able to be accessed by theirlocal machine.

If now during the processing carried out by a particular machine, saymachine Mn, the class Xn needs to refer to the object A, then class Xnis said to point to object A. This is indicated in FIG. 6 by an arrowpointing from class Xn to object A. The change in status of object Ameans that it is now able to be accessed or referenced by all the othermachines. For this reason in FIG. 5 it is named object An, is bounded bydouble lines, and is reproduced in each of the other machines as objectA1, A2, etc. Furthermore, an arrow points from each corresponding classX1, X2, etc. to the corresponding referred object A1, A2, etc. As aresult of this change of status of object A, the first row of thereachability table of FIG. 4 is amended as illustrated in FIG. 7 so asto indicate that object A is now able to be reached by the machines M1,M2 and Mn. The server machine X of FIG. 2 uses the amended reachabilitytable of FIG. 7 to ensure that the contents of object A, if amended byone machine, are transmitted via the network 53 to all the othermachines.

Similarly, for the situation where multiple reachability tables areused, when the change illustrated by comparison of FIGS. 3 and 6 takesplace, since class Xn now refers to object An, and thus all the otherclasses X1, X2, etc must now refer to corresponding objects A1, A2, etc.so all machines must now include a row in their reachability table forobject A. This is the situation illustrated in FIG. 8. The othermachines are said to inherit the table entry for object A.

The abovementioned detailed description refers to memory locations,however, it is equally applicable to structures, assets or resources(which in JAVA are termed classes or objects). These will have alreadybeen allocated a (global) name or tag which can be used globally by allmachines (since it is understood that the local memory structure ofdifferent machines may be different). Thus the local or actual nameallocated to a specific memory location in one machine may well bedifferent from the local name allocated to the corresponding memorylocation in another machine. This global name allocation preferablyhappens during a compilation process at loading when the classes orobjects are originally initialized. This is most conveniently done via atable maintained by the server machine X. This table can also includethe reachability data.

It will be apparent to those skilled in the art that the reachabilitydata enables structures, assets or resources (ie memory locations) to bedivided into two categories or classes. The first category consists ofthose locations which are able to be accessed by all machines. It isnecessary that write actions carried out in respect of such memorylocations be distributed to all machines so that all correspondingmemory locations have the same content (except for delays due totransmission of updating data). However, in respect of the secondcategory, since these memory locations are only accessible by the localmachine, write actions to these memory locations need not be distributedto all the other machines, nor need there be corresponding memorylocations on the other machines. As a consequence of thiscategorisation, a substantial volume of data is not required to betransmitted from one machine to the others and so the volume of trafficon the network 53 is substantially reduced.

Machine Mn can determine that object A requires replication across theother machines by consulting the table entry for class X on machine Mn.In the situation illustrated, machine Mn makes a positive determinationfor replication by comparing the table entries for object A and class X.If the table entry for object A includes all machines in the table entryfor class X, then machine Mn can correctly determine that object A doesnot need to be replicated on any other machines and, additionally, notable entries need to be added to, or updated on, other machines.Alternatively, if the table entry of object A does not include the fullset of machines in the table entry of class X then machine Mn updatesthe table entry for object A to include the set of machines listed inthe table entry for class X, and additionally instructs all machineslisted in the new table entry for object A to update their local tablesfor object A with the set of machines listed in the new table entry forobject A on machine Mn. Finally, for the set of machines which were notalready present in the table entry for object A on machine Mn prior tothe inheritance of the set of machines of class X on machine Mn, machineMn instructs those machines (ie machines M1 and M2) to add a local tableentry for object A and create local replicas in memory of object A andassociated references to class X.

In addition to reducing the volume of data required to be transmittedvia the network 53, the abovementioned categorization and reachabilitytable(s) also provide an advantage in the event of failure of one of thecomputers. This is that the entire system does not fail. Instead thesystem is able to recover.

To continue the above example, suppose that in the memory conditionillustrated in FIG. 6, machine M2, say, fails (for example due tofailure of its power supply, CPU, failure of its link to the network 53or similar catastrophic failure). This failure is able to detected by aconventional detector attached to each of the application programrunning machines and reporting to machine X, for example.

Such a detector is commercially available as a Simple Network ManagementProtocol (SNMP). This is essentially a small program which operates inthe background and provides a specified output signal in the event thatfailure is detected.

Such a detector is able to sense failure in a number of ways, any one,or more, of which can be used simultaneously. For example, machine X caninterrogate each of the other machines M1, . . . Mn in turn requesting areply. If no reply is forthcoming after a predetermined time, or after asmall number of “reminders” are sent, also without reply, thenon-responding machine is pronounced “dead”.

Alternatively, or additionally, each of the machines M1, . . . Mn can atregular intervals, say every 30 seconds, send a predetermined message tomachine X (or to all other machines in the absence of a server) to saythat all is well. In the absence of such a message the machine can bepresumed “dead” or can be interrogated (and if it then fails to respond)is pronounced “dead”.

Further methods include looking for a turn on event in anuninterruptible power supply (UPS) used to power each machine whichtherefore indicates a failure of mains power. Similarly conventionalswitches such as those manufactured by CISCO of California, USA includea provision to check either the presence of power to the communicationsnetwork 53, or whether the network cable is disconnected.

In some circumstances, for example for enhanced redundancy or forincreased bandwidth, each individual machine can be “multi-peered” whichmeans there are two or more links between the machine and thecommunications network 53. An SNMP product which provides two options inthis circumstance—namely wait for both/all links to fail beforesignalling machine failure, or signal machine failure if any one linkfails, is the 12 Port Gigabit Managed Switch GSM 7212 sold under thetrade marks NETGEAR and PROSAFE.

In the event that machine failure is detected, the proceduresillustrated in FIG. 9 then come into operation. Step 91 in FIG. 9 istriggered by the detection of machine failure (for machine M2 in thisexample). As a consequence, machine X examines each record (or row) inits reachability table in turn as indicated at step 92. For each recordthe question of step 93 is asked to determine if the record in questionrefers to failed machine M2. If it does, at step 94 the reference tofailed machine M2 is removed from the row, and then step 95 iscommenced. If it does not, step 95 is commenced immediately.

As indicated in FIG. 9, at step 95 any remaining record or row in thetable is then subjected to step 93 until eventually all records havebeen interrogated and thus no further action is required as indicated atstep 96.

Turning now to FIG. 10, in the event that there is no server machine Xand instead there are a multiplicity of individual reachability tables,then each of the machines is able to detect failure of any one of theother machines (for example, by means of all machines providing apredetermined message at regular intervals). In addition, each of thecontinuing machines M1, M3 . . . Mn carries out steps 101-105 of FIG. 10which are equivalent to steps 91-95 of FIG. 9, but in respect of itslocal reachability table. The result is that each of the localreachability tables makes no reference to machine M2 (ie has the columncontaining the “2's” empty—or removed).

FIG. 11 schematically illustrates the situation where machine M2 isdead. This might be due to a power failure or, as indicated in FIG. 11,due to a break in the link between machine M2 and the communicationsnetwork 53. The changed reachability tables (single and multiple) arerespectively illustrated in FIGS. 12 and 13.

Thus the effect of the procedure of either FIGS. 9 and 10 is to removethe column from the table of FIG. 7 which makes reference to machine 2.As a result, the in due course updating of corresponding memorylocations X1, Xn and A1, An can take place without machine M2 beingactive.

Therefore any action which requires an acknowledgement from machine M2,such as a response to the question “Has a data packet been receivedwithout error?” and for which no response is possible because machine M2has failed, does not delay the functioning of the other machines M1, M3,. . . Mn. As a consequence, those portions of the application program 5which are executing on the continuing machines M1, M3, . . . Mn continueto execute without interruption.

It will also be apparent to those skilled in the art that the failure ofmachine M2 is not in any way special or restricted to the secondmachine. That is, it could have been any one of the machines whichfailed. Thus, if another machine should now fail, the same procedure iscarried out. Therefore successive failure of each of a number ofmachines in turn can be tolerated, and without loss of memory since thecontents of memory locations X2 and A2 are duplicated elsewhere whilstthe content of memory location D will in due course be regenerated bythe re-execution of the code previously executing on machine M2, beingcarried out by one of the continuing machines.

The foregoing describes only some embodiments of the present inventionand modifications, obvious to those skilled in the art, can be madethereto without departing from the scope of the present invention. Forexample, the tables of FIGS. 4 and 7 each show a row corresponding toeach memory location. In practice, for those memory locations such as Dand E which are only accessible by their local machine, it is notnecessary to have a row in the table at all. Instead, such a row is onlycreated if the memory location becomes accessible by one or more othermachines. For example, reference to JAVA includes both the JAVA languageand also JAVA platform and architecture.

Similarly, the above described arrangements envisage n computers each ofwhich shares a fraction (1/n th) of the application program. Under suchcircumstances all n computers have the same local memory structure.However, it is possible to operate such a system in which a subset onlyof the computers has the same local memory structure. Under thisscenario, the maximum number of members of the subset is to be regardedas n the in the description above.

In all described instances of modification, where the application code50 is modified before, or during loading, or even after loading butbefore execution of the unmodified application code has commenced, it isto be understood that the modified application code is loaded in placeof, and executed in place of, the unmodified application codesubsequently to the modifications being performed.

Alternatively, in the instances where modification takes place afterloading and after execution of the unmodified application code hascommenced, it is to be understood that the unmodified application codemay either be replaced with the modified application code in whole,corresponding to the modifications being performed, or alternatively,the unmodified application code may be replaced in part or incrementallyas the modifications are performed incrementally on the executingunmodified application code. Regardless of which such modificationroutes are used, the modifications subsequent to being performed executein place of the unmodified application code.

It is advantageous to use a global identifier is as a form of‘meta-name’ or ‘meta-identity’ for all the similar equivalent localobjects (or classes, or assets or resources or the like) on each one ofthe plurality of machines M1, M2 . . . Mn. For example, rather thanhaving to keep track of each unique local name or identity of eachsimilar equivalent local object on each machine of the plurality ofsimilar equivalent objects, one may instead define or use a global namecorresponding to the plurality of similar equivalent objects on eachmachine (e.g. “globalname7787”), and with the understanding that eachmachine relates the global name to a specific local name or object (e.g.“globalname7787” corresponds to object “localobject456” on machine M1,and “globalname7787” corresponds to object “localobject885” on machineM2, and “globalname7787” corresponds to object “localobject111” onmachine M3, and so forth).

It will also be apparent to those skilled in the art in light of thedetailed description provided herein that in a table or list or otherdata structure created by each DRT 71 when initially recording orcreating the list of all, or some subset of all objects (e.g. memorylocations or fields), for each such recorded object on each machine M1,M2 . . . Mn there is a name or identity which is common or similar oneach of the machines M1, M2 . . . Mn. However, in the individualmachines the local object corresponding to a given name or identity willor may vary over time since each machine may, and generally will, storememory values or contents at different memory locations according to itsown internal processes. Thus the table, or list, or other data structurein each of the DRTs will have, in general, different local memorylocations corresponding to a single memory name or identity, but eachglobal “memory name” or identity will have the same “memory value orcontent” stored in the different local memory locations. So for eachglobal name there will be a family of corresponding independent localmemory locations with one family member in each of the computers.Although the local memory name may differ, the asset, object, locationetc has essentially the same content or value. So the family iscoherent.

The term “table” or “tabulation” as used herein is intended to embraceany list or organised data structure of whatever format and within whichdata can be stored and read out in an ordered fashion.

It will also be apparent to those skilled in the art in light of thedescription provided herein that the abovementioned modification of theapplication program code 50 during loading can be accomplished in manyways or by a variety of means. These ways or means include, but are notlimited to at least the following five ways and variations orcombinations of these five, including by:

-   -   (i) re-compilation at loading,    -   (ii) a pre-compilation procedure prior to loading,    -   (iii) compilation prior to loading,    -   (iv) “just-in-time” compilation(s), or    -   (v) re-compilation after loading (but, for example, before        execution of the relevant or corresponding application code in a        distributed environment).

Traditionally the term “compilation” implies a change in code orlanguage, for example, from source to object code or one language toanother. Clearly the use of the term “compilation” (and its grammaticalequivalents) in the present specification is not so restricted and canalso include or embrace modifications within the same code or language.

Those skilled in the computer and/or programming arts will be aware thatwhen additional code or instructions is/are inserted into an existingcode or instruction set to modify same, the existing code or instructionset may well require further modification (such as for example, byre-numbering of sequential instructions) so that offsets, branching,attributes, mark up and the like are properly handled or catered for.

Similarly, in the JAVA language memory locations include, for example,both fields and array types. The above description deals with fields andthe changes required for array types are essentially the same mutatismutandis. Also the present invention is equally applicable to similarprogramming languages (including procedural, declarative and objectorientated languages) to JAVA including Microsoft.NET platform andarchitecture (Visual Basic, Visual C/C⁺⁺, and C#) FORTRAN, C/C⁺⁺, COBOL,BASIC etc.

The terms object and class used herein are derived from the JAVAenvironment and are intended to embrace similar terms derived fromdifferent environments such as dynamically linked libraries (DLL), orobject code packages, or function unit or memory locations.

Various means are described relative to embodiments of the invention,including for example but not limited to lock means, distributed runtime means, modifier or modifying means, and the like. In at least oneembodiment of the invention, any one or each of these various means maybe implemented by computer program code statements or instructions(possibly including by a plurality of computer program code statementsor instructions) that execute within computer logic circuits,processors, ASICs, logic or electronic circuit hardware,microprocessors, microcontrollers or other logic to modify the operationof such logic or circuits to accomplish the recited operation orfunction. In another embodiment, any one or each of these various meansmay be implemented in firmware and in other embodiments such may beimplemented in hardware. Furthermore, in at least one embodiment of theinvention, any one or each of these various means may be implemented bya combination of computer program software, firmware, and/or hardware.

Any and each of the abovedescribed methods, procedures, and/or routinesmay advantageously be implemented as a computer program and/or computerprogram product stored on any tangible media or existing in electronic,signal, or digital form. Such computer program or computer programproducts comprising instructions separately and/or organized as modules,programs, subroutines, or in any other way for execution in processinglogic such as in a processor or microprocessor of a computer, computingmachine, or information appliance; the computer program or computerprogram products modifying the operation of the computer in which itexecutes or on a computer coupled with, connected to, or otherwise insignal communications with the computer on which the computer program orcomputer program product is present or executing. Such a computerprogram or computer program product modifies the operation andarchitectural structure of the computer, computing machine, and/orinformation appliance to alter the technical operation of the computerand realize the technical effects described herein.

The invention may therefore include a computer program productcomprising a set of program instructions stored in a storage medium orexisting electronically in any form and operable to permit a pluralityof computers to carry out any of the methods, procedures, routines, orthe like as described herein including in any of the claims.

Furthermore, the invention includes (but is not limited to) a pluralityof computers, or a single computer adapted to interact with a pluralityof computers, interconnected via a communication network or othercommunications link or path and each operable to substantiallysimultaneously or concurrently execute the same or a different portionof an application code written to operate on only a single computer on acorresponding different one of computers. The computers are programmedto carry out any of the methods, procedures, or routines described inthe specification or set forth in any of the claims, on being loadedwith a computer program product or upon subsequent instruction.Similarly, the invention also includes within its scope a singlecomputer arranged to co-operate with like, or substantially similar,computers to form a multiple computer system

To summarise, there is provided a failure resistant method of operatinga plurality of computers each with their corresponding independent localmemory, each substantially simultaneously operating a correspondingportion of an application program written to execute on only a singlecomputer, and each being connected via a communications network topermit updating of corresponding memory locations, the method comprisingthe steps of:

-   -   (i) categorizing the memory locations of the local memories into        a first reachability category in which the local memory        locations are replicated in selected ones, or all, of the        computers and therefore require updating via the communications        network with changes to corresponding memory locations of the        other computers to maintain substantial memory coherence, and        into a second category in which the local memory locations are        present only in the local computer and therefore no updating is        required,    -   (ii) detecting failure of any one of the multiple computers, and    -   (iii) modifying the first category to remove therefrom, if        present, any reference to the failed computer,        whereby no attempt is made to update any first category        locations of the failed computer.

Preferably the method includes the further step of;

-   -   (iv) maintaining data regarding the memory locations        categorization in a reachability table.

Preferably the method includes the further step of:

-   -   (v) maintaining a single the reachability table on a server        computer not forming one of the multiple computers and connected        thereto via the communications network.        Alternatively the method includes the further step of:    -   (vi) maintaining a multiplicity of reachability tables, each on        a corresponding one of the multiple computers.        Preferably the method includes the further step of;    -   (vii) detecting failure by at least one of the group of failure        detection modes consisting of power supply failure,        communication link failure, failure to respond to interrogation,        and failure to regularly report as expected.

Preferably the memory locations include an asset, structure or resource.

There is also provided a computer program product comprising a set ofprogram instructions stored in a storage medium and operable to permit aplurality of computers to carry out the above described method(s).

Also provided is a plurality of computers interconnected via acommunications network and operable to ensure carrying out any of theabove method(s).

Further there is provided a failure resistant multiple computer systemin which a plurality of computers each has a corresponding independentlocal memory, each simultaneously operates a corresponding portion of anapplication program written to be executed only on a single computer,and each is connected via a communications network to permit updating ofcorresponding memory locations, the system including a reachabilitymeans to categorize memory locations of the local memories into a firstcategory in which the local memory locations are replicated in selectedones, or all, of the computers and therefore require updating via thecommunications network with changes to corresponding memory locations ofother computers, to maintain substantial memory coherence, and into asecond category in which the local memory locations are present only inthe local computer and therefore no updating is required, and whereinthe system further includes a failure detection means connected to eachthe computer to detect failure of any one of the multiple computers, anda reachability modifier connected to the failure connection means and tothe reachability means to modify the reachability means by modifying thefirst category to remove therefrom, if present, any reference by thefailed computer whereby no attempt is made to update any first categorymemory locations of the failed computer.

Preferably the reachability means comprises a reachability table inwhich is maintained data regarding the memory location classification.

Preferably a server computer is connected to the communications network,the server computer including a single reachability table.

Alternatively each of the multiple computers includes a correspondingreachability table.

Preferably the failure detection means is selected from the groupconsisting of power supply failure detectors, communication link failuredetectors, interrogation response failure detectors, and regularreporting failure detectors.

Preferably the memory locations include an asset, structure or resource.

Also provided is a single computer adapted to co-operate with at leastone other computer in order to carry out any of the above method(s) orform any of the above computer systems.

The term “comprising” (and its grammatical variations) as used herein isused in the inclusive sense of “having” or “including” and not in theexclusive sense of “consisting only of”.

1. In a multiple computer system comprising a plurality of computers,each including a local processor and a local memory coupled with thelocal processor, and including a first computer and a second computerinterconnected via a communications link or network operating in areplicated shared memory arrangement, a method of classifying said localmemory(ies) a detecting a failure of at least one of said computerscomprising: classifying said local memories into a first category ofmemory locations each of which is replicated on two or more computers ofsaid plurality of computers; classifying said local memories into asecond category of memory locations each of which is present only in thespecific one of said plurality of computers in which each said secondcategory of memory location is physically located; and detecting afailure of at least one of said computers.
 2. A method as in claim 1,further including: a. maintaining a first table listing or recordingsaid first category memory locations.
 3. A method as in claim 2 furtherincluding: a. maintaining a second table listing or recording saidsecond category memory locations.
 4. A method as in claim 1, furtherincluding: a. maintaining a first table listing or recording said firstcategory memory locations; b. maintaining a second table listing orrecording said second category memory locations; and c. said first tableand said second table are the same table or are different tables.
 5. Amethod as in claim 2, further including: a. not maintaining a tablelisting or recording said second category memory locations.
 6. A methodas in claim 1, further including: a. maintaining at least one of a firsttable listing or recording said first category memory locations, and asecond table listing or recording said second category memory locations,on a further server computer.
 7. A method as in claim 1, furtherincluding: a. maintaining multiple ones of a first table listing orrecording said first category memory locations, and a second tablelisting or recording said second category memory locations, at least onein each of said multiple computers.
 8. A method as in claim 1, whereinsaid first category memory locations of a said computer do not access orrefer to any second category memory locations of the same computer.
 9. Amethod as in claim 8, wherein said access includes memory addresses ofsaid second category memory locations.
 10. A method as in claim 8,wherein said access includes pointers, references, handles, or links toor of said second category memory locations.
 11. A method as in claim 1,wherein said memory locations comprise an object or objects.
 12. Amethod as in claim 1, wherein said memory locations comprise a class orclasses.
 13. A method as in claim 1, wherein said memory locationscomprise object field(s) or class field(s).
 14. A method as in claim 1,wherein said memory locations comprise data structure(s).
 15. A methodas in claim 1, wherein said memory locations comprise array datastructure(s).
 16. A method as in claim 1, wherein said memory locationscomprise elements of array data structure(s).
 17. A method as in claim1, wherein said memory locations comprise libraries, linked libraries,and/or dynamically linked libraries.
 18. A method as in claim 1, furtherincluding: a. maintaining a replication table listing or recording theones of said multiple computers on which a said first category memorylocation is replicated.
 19. A method as in claim 18, further including:a. maintaining one said replication table for each said first categorymemory location.
 20. A method as in claim 18, further including: a.maintaining one said replication table for a plurality of first categorymemory locations.
 21. A method as in claim 20, further including: a.maintaining at least one said replication table for each plurality offirst category memory locations of possible multiple pluralities.
 22. Amethod as in claim 20, wherein said plurality of first category memorylocations are plural memory locations of a related set of memorylocations.
 23. A method as in claim 22, wherein said related set ofmemory locations are an array of memory locations.
 24. A method as inclaim 23, wherein said array of memory locations comprise an array datastructure.
 25. A method as in claim 22, wherein said related set ofmemory locations are memory locations of an object or class.
 26. Amethod as in claim 25, wherein said memory locations of an object orclass are object fields or variables, or class fields or variables. 27.A method as in claim 18, wherein said replication table and either orboth of, said table(s) of said first category memory locations and saidtable(s) of said second category memory locations, are a single or thesame table.
 28. A method as in claim 18, further including: a.maintaining multiple said replication tables, one in each of saidmultiple computers.
 29. A method as in claim 18, further including: a.maintaining multiple said replication tables, one of each said multipletables in each of said multiple computers.
 30. A method as in claim 18,further including: a. maintaining multiple said replication tables ineach of said multiple computers.
 31. A method as in claim 18, furtherincluding: a. maintaining one said replication table, for all saidmultiple computers on a further server computer.
 32. A method as inclaim 1, further including: a. substantially simultaneously updatingsaid first category memory locations of the other ones of said computerswith any changes made to a first category memory location of any one ofsaid computers.
 33. A method as in claim 32, further including: a.utilizing said replication table(s) to determine which ones of saidmultiple computers are to be said substantially simultaneously updated.34. A method as in claim 32, further including: a. not updating saidsecond category memory locations of the other ones of said computerswith any changes made to a second category memory location of any one ofsaid computers.
 35. A method as in claim 32, wherein said substantiallysimultaneous updating includes updating a first category memorylocations of the other ones of said computers on which said firstcategory memory locations is replicated, with any changes made to saidfirst category memory locations of any one of said computers.
 36. Amethod as in claim 32, wherein said substantially simultaneous updatingexcludes updating said first category memory locations of the other onesof said computers on which said first category memory locations is notreplicated.
 37. A method as in claim 1, including the further steps of:a. detecting failure of any one or a plurality of said multiplecomputers.
 38. A method as in claim 37, wherein said failure includesdiscontinued or erroneous operation of said one(s) of said multiplecomputers.
 39. A method as in claim 37, wherein said detecting failureincludes detecting failure by at least one of the group of failuredetection modes consisting of power supply failure, communication linkfailure, failure to respond to interrogation, and failure to regularlyreport as expected.
 40. A method as in claim 37, including the furtherstep of: a. not substantially simultaneously updating (or discontinuingthe substantially simultaneous updating of) said failed computer(s) withany changes made to first category memory location(s).
 41. A method asin claim 37, including the further step of: a. not substantiallysimultaneously updating (or discontinuing the substantially simultaneousupdating of) said failed computer(s) with any changes made to firstcategory memory location(s) previously replicated on said failedcomputer(s) prior to (or upon occasion of) said failure.
 42. A method asin claim 37, including the further step of: a. upon occasion of saidfailure by one(s) of said multiple computers, updating said table(s) offirst category memory location(s) to exclude (or remove) said failedone(s) of said multiple computers.
 43. A method as in claim 37,including the further step of: a. upon occasion of said failure byone(s) of said multiple computers on which said first category memorylocation(s) is (or were) replicated, updating said table(s) of firstcategory memory location(s) to exclude (or remove) said failed one(s) ofsaid multiple computers.
 44. A method as in claim 37, including thefurther step of: a. upon occasion of said failure by one(s) of saidmultiple computers, updating said replication table(s) to exclude (orremove) said failed one(s) of said multiple computers.
 45. A method asin claim 37, including the further step of: a. upon occasion saidfailure by one(s) of said multiple computers on which said firstcategory memory location is (or were) replicated, updating saidreplication table(s) to exclude (or remove) said failed one(s) of saidmultiple computers.
 46. A method as in claim 42, wherein said updatingof said table(s) includes updating said tables on each non-failed one ofsaid multiple computers.
 47. A method as in claim 42, wherein saidupdating of said table(s) includes updating said table(s) on eachnon-failed one of said multiple computers in which said first categorymemory location is replicated.
 48. A method as in claim 46, includingthe further step of: a. not updating said table(s) of said firstcategory memory location on each non-failed one of said multiplecomputers in which said first category memory location is not (or wasnot) replicated.
 49. A method as in claim 44, wherein said updating ofsaid replication table(s) includes updating said replication table(s) oneach non-failed one of said multiple computers.
 50. A method as in claim44, wherein said updating of said replication table(s) includes updatingsaid replication table(s) on each non-failed one of said multiplecomputers in which said existing first category memory location(s) isreplicated.
 51. A method as in claim 49, including the further step of:a. not updating said replication table(s) of said first category memorylocation on each non-failed one of said multiple computers in which saidfirst category memory location is not (or was not) replicated.
 52. Amethod as in claim 1, wherein said local memory(ies) of each saidcomputer are independent of said local memory(ies) of each othercomputer.
 53. A method as in claim 1, wherein said local processors mayonly access said local memory(ies) of the same computer in which thelocal processor is located.
 54. A method as in claim 1, wherein at leasta first application program written to operate on a single one of saidcomputers, is operating substantially simultaneously on different onesof said multiple computers.
 55. A method as in claim 54, wherein saidapplication program operating substantially simultaneously on each ofsaid different ones of said computers, may only access said localmemory(ies) of the same computer.
 56. A method as in claim 55, whereinsaid access is satisfied by said local memory(ies) of the same computerindependently of (or without the aid of) said local memory(ies) of anyother computer.
 57. A method as in claim 55, wherein said accessincludes reading and/or writing content or values stored or residentwithin said local memory(ies) of the same computer.
 58. A method as inclaim 56, wherein said access includes reading and/or writing content orvalues stored or resident within said local memory(ies) of the samecomputer.
 59. A method as in claim 55, wherein said access is restrictedto reading and/or writing content or values stored or resident withinsaid local memory(ies) of the same computer.
 60. A method as in claim56, wherein said access is restricted to reading and/or writing contentor values stored or resident within said local memory(ies) of the samecomputer.
 61. A method as in claim 55, wherein said access includesreading and/or writing content or values of real or virtual memoryaddresses of or resident within said local memory(ies) of the samecomputer.
 62. A method as in claim 56, wherein said access includesreading and/or writing content or values of real or virtual memoryaddresses of or resident within said local memory(ies) of the samecomputer.
 63. A method as in claim 57, wherein said access includesreading and/or writing content or values of real or virtual memoryaddresses of or resident within said local memory(ies) of the samecomputer.
 64. A method as in claim 55, wherein said access is restrictedto reading and/or writing content or values of real or virtual memoryaddresses of or resident within said local memory(ies) of the samecomputer.
 65. A method as in claim 56, wherein said access is restrictedto reading and/or writing content or values of real or virtual memoryaddresses of or resident within said local memory(ies) of the samecomputer.
 66. A method as in claim 57, wherein said access is restrictedto reading and/or writing content or values of real or virtual memoryaddresses of or resident within said local memory(ies) of the samecomputer.
 67. A method as in claim 55, wherein at least one memorylocation and/or memory value of said application program issubstantially similarly replicated in said local memory(ies) of saiddifferent ones of said multiple computers.
 68. A method as in claim 64,wherein said substantially similarly replicated memory location(s)and/or value(s) are stored non-identically in said local memory(ies) ofsaid different ones of said multiple computers.
 69. A method as in claim65, wherein said substantially similarly replicated memory location(s)and/or memory value(s) are updated through in-due-course updating toremain substantially similar upon occasion of any one of said pluralityof computers simultaneously operating said application programmodifying, or causing to be modified, the value(s) or content(s) of saidsubstantially similarly replicated memory location(s) and/or memoryvalue(s).
 70. A method as in claim 65, wherein each said substantiallysimilarly replicated memory location(s) and/or value(s) of each one ofsaid multiple computers is identified with a substantially similaridentifier.
 71. A method as in claim 69, wherein said in-due-courseupdating provides that said replicated memory locations are updated toremain substantially similar upon occasion of any one of said computerssimultaneously operating said application program causing modificationof the contents of said replicated memory location.
 72. A method as inclaim 1, wherein: said local memory(ies) of each said computer areindependent of said local memory(ies) of each other computer; said localprocessors may only access said local memory(ies) of the same computerin which the local processor is located; at least a first applicationprogram written to operate on a single one of said computers, isoperating substantially simultaneously on different ones of saidmultiple computers; said application program operating substantiallysimultaneously on each of said different ones of said computers, mayonly access said local memory(ies) of the same computer; said access issatisfied by said local memory(ies) of the same computer independentlyof (or without the aid of) said local memory(ies) of any other computer;said access includes reading and/or writing content or values stored orresident within said local memory(ies) of the same computer, or saidaccess is restricted to reading and/or writing content or values storedor resident within said local memory(ies) of the same computer; saidaccess includes reading and/or writing content or values of real orvirtual memory addresses of or resident within said local memory(ies) ofthe same computer, or said access is restricted to reading and/orwriting content or values of real or virtual memory addresses of orresident within said local memory(ies) of the same computer; at leastone memory location and/or memory value of said application program issubstantially similarly replicated in said local memory(ies) of saiddifferent ones of said multiple computers; said substantially similarlyreplicated memory location(s) and/or value(s) are stored non-identicallyin said local memory(ies) of said different ones of said multiplecomputers; said substantially similarly replicated memory location(s)and/or memory value(s) are updated through in-due-course updating toremain substantially similar upon occasion of any one of said pluralityof computers simultaneously operating said application programmodifying, or causing to be modified, the value(s) or content(s) of saidsubstantially similarly replicated memory location(s) and/or memoryvalue(s); each said substantially similarly replicated memorylocation(s) and/or value(s) of each one of said multiple computers isidentified with a substantially similar identifier; and saidin-due-course updating provides that said replicated memory locationsare updated to remain substantially similar upon occasion of any one ofsaid computers simultaneously operating said application program causingmodification of the contents of said replicated memory location.
 73. Amethod as in claim 51, wherein: said local memory(ies) of each saidcomputer are independent of said local memory(ies) of each othercomputer; said local processors may only access said local memory(ies)of the same computer in which the local processor is located; at least afirst application program written to operate on a single one of saidcomputers, is operating substantially simultaneously on different onesof said multiple computers; said application program operatingsubstantially simultaneously on each of said different ones of saidcomputers, may only access said local memory(ies) of the same computer;said access is satisfied by said local memory(ies) of the same computerindependently of (or without the aid of) said local memory(ies) of anyother computer; said access includes reading and/or writing content orvalues stored or resident within said local memory(ies) of the samecomputer, or said access is restricted to reading and/or writing contentor values stored or resident within said local memory(ies) of the samecomputer; said access includes reading and/or writing content or valuesof real or virtual memory addresses of or resident within said localmemory(ies) of the same computer, or said access is restricted toreading and/or writing content or values of real or virtual memoryaddresses of or resident within said local memory(ies) of the samecomputer; at least one memory location and/or memory value of saidapplication program is substantially similarly replicated in said localmemory(ies) of said different ones of said multiple computers; saidsubstantially similarly replicated memory location(s) and/or value(s)are stored non-identically in said local memory(ies) of said differentones of said multiple computers; said substantially similarly replicatedmemory location(s) and/or memory value(s) are updated throughin-due-course updating to remain substantially similar upon occasion ofany one of said plurality of computers simultaneously operating saidapplication program modifying, or causing to be modified, the value(s)or content(s) of said substantially similarly replicated memorylocation(s) and/or memory value(s); each said substantially similarlyreplicated memory location(s) and/or value(s) of each one of saidmultiple computers is identified with a substantially similaridentifier; and said in-due-course updating provides that saidreplicated memory locations are updated to remain substantially similarupon occasion of any one of said computers simultaneously operating saidapplication program causing modification of the contents of saidreplicated memory location.
 74. A method as in claim 1, furthercomprising: maintaining a replication table listing or recording theones of said multiple computers on which a said first category memorylocation is replicated; substantially simultaneously updating said firstcategory memory locations of the other ones of said computers with anychanges made to a first category memory location of any one of saidcomputers; detecting failure of any one or a plurality of said multiplecomputers; not substantially simultaneously updating (or discontinuingthe substantially simultaneous updating of) said failed computer(s) withany changes made to first category memory location(s); upon occasion ofsaid failure by one(s) of said multiple computers, updating saidtable(s) of first category memory location(s) to exclude (or remove)said failed one(s) of said multiple computers; and said updating of saidreplication table(s) includes updating said replication table(s) on eachnon-failed one of said multiple computers.
 75. A computer program storedon a computer readable memory device comprising instructions which, whenexecuted on a computer, perform in at least one single computer capableof interoperating with at least one other computer coupled to at leastone said single computer at least intermittently via a communicationsnetwork to form a multiple computer system having a plurality ofcomputers wherein each computer has a local memory and the multiplecomputer system operating in a replicated shared memory arrangement, amethod of classifying said local memory(ies) comprising the steps of:classifying said local memories into a first category of memorylocations each of which is replicated on two or more computers of saidplurality of computers; and classifying said local memories into asecond category of memory locations each of which is present only in thespecific one of said plurality of computers in which each said secondcategory of memory location is physically located; and detecting afailure of at least one of said computers.
 76. A multiple computersystem comprising: a plurality of computers, each including a localprocessor and a local memory coupled with the local processor, andincluding a first computer and a second computer interconnected via acommunications link or network operating in a replicated shared memoryarrangement, a method of classifying said local memory(ies) comprising:classifying said local memories into a first category of memorylocations each of which is replicated on two or more computers of saidplurality of computers; classifying said local memories into a secondcategory of memory locations each of which is present only in thespecific one of said plurality of computers in which each said secondcategory of memory location is physically located; and detecting afailure of at least one of said computers.